Temporal convolutional networks (TCNs) are deep learning models that use 1D convolutions for sequence modeling tasks.
Rapid and accurate building damage assessment in the immediate aftermath of tornadoes is critical for coordinating life-saving search and rescue operations, optimizing emergency resource allocation, and accelerating community recovery. However, current automated methods struggle with the unique visual complexity of tornado-induced wreckage, primarily due to severe domain shift from standard pre-training datasets and extreme class imbalance in real-world disaster data. To address these challenges, we introduce a systematic experimental framework evaluating 79 open-source deep learning models, encompassing both Convolutional Neural Networks (CNNs) and Vision Transformers, across over 2,300 controlled experiments on our newly curated Quad-State Tornado Damage (QSTD) benchmark dataset. Our findings reveal that achieving operational-grade performance hinges on a complex interaction between architecture and optimization, rather than architectural selection alone. Most strikingly, we demonstrate that optimizer choice can be more consequential than architecture: switching from Adam to SGD provided dramatic F1 gains of +25 to +38 points for Vision Transformer and Swin Transformer families, fundamentally reversing their ranking from bottom-tier to competitive with top-performing CNNs. Furthermore, a low learning rate of 1x10^(-4) proved universally critical, boosting average F1 performance by +10.2 points across all architectures. Our champion model, ConvNeXt-Base trained with these optimized settings, demonstrated strong cross-event generalization on the held-out Tuscaloosa-Moore Tornado Damage (TMTD) dataset, achieving 46.4% Macro F1 (+34.6 points over baseline) and retaining 85.5% Ordinal Top-1 Accuracy despite temporal and sensor domain shifts.
Small datasets like MNIST have historically been instrumental in advancing machine learning research by providing a controlled environment for rapid experimentation and model evaluation. However, their simplicity often limits their utility for distinguishing between advanced neural network architectures. To address these challenges, Greydanus et al. introduced the MNIST-1D dataset, a one-dimensional adaptation of MNIST designed to explore inductive biases in sequential data. This dataset maintains the advantages of small-scale datasets while introducing variability and complexity that make it ideal for studying advanced architectures. In this paper, we extend the exploration of MNIST-1D by evaluating the performance of Residual Networks (ResNet), Temporal Convolutional Networks (TCN), and Dilated Convolutional Neural Networks (DCNN). These models, known for their ability to capture sequential patterns and hierarchical features, were implemented and benchmarked alongside previously tested architectures such as logistic regression, MLPs, CNNs, and GRUs. Our experimental results demonstrate that advanced architectures like TCN and DCNN consistently outperform simpler models, achieving near-human performance on MNIST-1D. ResNet also shows significant improvements, highlighting the importance of leveraging inductive biases and hierarchical feature extraction in small structured datasets. Through this study, we validate the utility of MNIST-1D as a robust benchmark for evaluating machine learning architectures under computational constraints. Our findings emphasize the role of architectural innovations in improving model performance and offer insights into optimizing deep learning models for resource-limited environments.
Reactive anomaly detection methods, which are commonly deployed to identify anomalies after they occur based on observed deviations, often fall short in applications that demand timely intervention, such as industrial monitoring, finance, and cybersecurity. Proactive anomaly detection, by contrast, aims to detect early warning signals before failures fully manifest, but existing methods struggle with handling heterogeneous multivariate data and maintaining precision under noisy or unpredictable conditions. In this work, we introduce two proactive anomaly detection frameworks: the Forward Forecasting Model (FFM) and the Backward Reconstruction Model (BRM). Both models leverage a hybrid architecture combining Temporal Convolutional Networks (TCNs), Gated Recurrent Units (GRUs), and Transformer encoders to model directional temporal dynamics. FFM forecasts future sequences to anticipate disruptions, while BRM reconstructs recent history from future context to uncover early precursors. Anomalies are flagged based on forecasting error magnitudes and directional embedding discrepancies. Our models support both continuous and discrete multivariate features, enabling robust performance in real-world settings. Extensive experiments on four benchmark datasets, MSL, SMAP, SMD, and PSM, demonstrate that FFM and BRM outperform state-of-the-art baselines across detection metrics and significantly improve the timeliness of anomaly anticipation. These properties make our approach well-suited for deployment in time-sensitive domains requiring proactive monitoring.
Many works have investigated radio map and path loss prediction in wireless networks using deep learning, in particular using convolutional neural networks. However, most assume perfect environment information, which is unrealistic in practice due to sensor limitations, mapping errors, and temporal changes. We demonstrate that convolutional neural networks trained with task-specific perturbations of geometry, materials, and Tx positions can implicitly compensate for prediction errors caused by inaccurate environment inputs. When tested with noisy inputs on synthetic indoor scenarios, models trained with perturbed environment data reduce error by up to 25\% compared to models trained on clean data. We verify our approach on real-world measurements, achieving 2.1 dB RMSE with noisy input data and 1.3 dB with complete information, compared to 2.3-3.1 dB for classical methods such as ray-tracing and radial basis function interpolation. Additionally, we compare different ways of encoding environment information at varying levels of detail and we find that, in the considered single-room indoor scenarios, binary occupancy encoding performs at least as well as detailed material property information, simplifying practical deployment.
Time series forecasting is a fundamental tool with wide ranging applications, yet recent debates question whether complex nonlinear architectures truly outperform simple linear models. Prior claims of dominance of the linear model often stem from benchmarks that lack diverse temporal dynamics and employ biased evaluation protocols. We revisit this debate through TimeSynth, a structured framework that emulates key properties of real world time series,including non-stationarity, periodicity, trends, and phase modulation by creating synthesized signals whose parameters are derived from real-world time series. Evaluating four model families Linear, Multi Layer Perceptrons (MLP), Convolutional Neural Networks (CNNs), and Transformers, we find a systematic bias in linear models: they collapse to simple oscillation regardless of signal complexity. Nonlinear models avoid this collapse and gain clear advantages as signal complexity increases. Notably, Transformers and CNN based models exhibit slightly greater adaptability to complex modulated signals compared to MLPs. Beyond clean forecasting, the framework highlights robustness differences under distribution and noise shifts and removes biases of prior benchmarks by using independent instances for train, test, and validation for each signal family. Collectively, TimeSynth provides a principled foundation for understanding when different forecasting approaches succeed or fail, moving beyond oversimplified claims of model equivalence.
Human pose estimation is fundamental to intelligent perception in the Internet of Things (IoT), enabling applications ranging from smart healthcare to human-computer interaction. While WiFi-based methods have gained traction, they often struggle with continuous motion and high computational overhead. This work presents WiFlow, a novel framework for continuous human pose estimation using WiFi signals. Unlike vision-based approaches such as two-dimensional deep residual networks that treat Channel State Information (CSI) as images, WiFlow employs an encoder-decoder architecture. The encoder captures spatio-temporal features of CSI using temporal and asymmetric convolutions, preserving the original sequential structure of signals. It then refines keypoint features of human bodies to be tracked and capture their structural dependencies via axial attention. The decoder subsequently maps the encoded high-dimensional features into keypoint coordinates. Trained on a self-collected dataset of 360,000 synchronized CSI-pose samples from 5 subjects performing continuous sequences of 8 daily activities, WiFlow achieves a Percentage of Correct Keypoints (PCK) of 97.00% at a threshold of 20% (PCK@20) and 99.48% at PCK@50, with a mean per-joint position error of 0.008m. With only 4.82M parameters, WiFlow significantly reduces model complexity and computational cost, establishing a new performance baseline for practical WiFi-based human pose estimation. Our code and datasets are available at https://github.com/DY2434/WiFlow-WiFi-Pose-Estimation-with-Spatio-Temporal-Decoupling.git.
Coastal hypoxia, especially in the northern part of Gulf of Mexico, presents a persistent ecological and economic concern. Seasonal models offer coarse forecasts that miss the fine-scale variability needed for daily, responsive ecosystem management. We present study that compares four deep learning architectures for daily hypoxia classification: Bidirectional Long Short-Term Memory (BiLSTM), Medformer (Medical Transformer), Spatio-Temporal Transformer (ST-Transformer), and Temporal Convolutional Network (TCN). We trained our models with twelve years of daily hindcast data from 2009-2020 Our training data consists of 2009-2020 hindcast data from a coupled hydrodynamic-biogeochemical model. Similarly, we use hindcast data from 2020 through 2024 as a test data. We constructed classification models incorporating water column stratification, sediment oxygen consumption, and temperature-dependent decomposition rates. We evaluated each architectures using the same data preprocessing, input/output formulation, and validation protocols. Each model achieved high classification accuracy and strong discriminative ability with ST-Transformer achieving the highest performance across all metrics and tests periods (AUC-ROC: 0.982-0.992). We also employed McNemar's method to identify statistically significant differences in model predictions. Our contribution is a reproducible framework for operational real-time hypoxia prediction that can support broader efforts in the environmental and ocean modeling systems community and in ecosystem resilience. The source code is available https://github.com/rmagesh148/hypoxia-ai/
Reliable terrain perception is a critical prerequisite for the deployment of humanoid robots in unstructured, human-centric environments. While traditional systems often rely on manually engineered, single-sensor pipelines, this paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation. A hybrid Encoder-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction fused with a Gated Recurrent Unit (GRU) core for temporal consistency. The architecture integrates multimodal data from an Intel RealSense depth camera, a LIVOX MID-360 LiDAR processed via efficient spherical projection, and an onboard IMU. Quantitative results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations. Furthermore, the integration of a 3.2 s temporal context reduces mapping drift.
The increasing complexity and frequency of cyber-threats demand intrusion detection systems (IDS) that are not only accurate but also interpretable. This paper presented a novel IDS framework that integrated Explainable Artificial Intelligence (XAI) to enhance transparency in deep learning models. The framework was evaluated experimentally using the benchmark dataset NSL-KDD, demonstrating superior performance compared to traditional IDS and black-box deep learning models. The proposed approach combined Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks for capturing temporal dependencies in traffic sequences. Our deep learning results showed that both CNN and LSTM reached 0.99 for accuracy, whereas LSTM outperformed CNN at macro average precision, recall, and F-1 score. For weighted average precision, recall, and F-1 score, both models scored almost similarly. To ensure interpretability, the XAI model SHapley Additive exPlanations (SHAP) was incorporated, enabling security analysts to understand and validate model decisions. Some notable influential features were srv_serror_rate, dst_host_srv_serror_rate, and serror_rate for both models, as pointed out by SHAP. We also conducted a trust-focused expert survey based on IPIP6 and Big Five personality traits via an interactive UI to evaluate the system's reliability and usability. This work highlighted the potential of combining performance and transparency in cybersecurity solutions and recommends future enhancements through adaptive learning for real-time threat detection.
We propose the Convolutional Operator Network for Forward and Inverse Problems (FI-Conv), a framework capable of predicting system evolution and estimating parameters in complex spatio-temporal dynamics, such as turbulence. FI-Conv is built on a U-Net architecture, in which most convolutional layers are replaced by ConvNeXt V2 blocks. This design preserves U-Net performance on inputs with high-frequency variations while maintaining low computational complexity. FI-Conv uses an initial state, PDE parameters, and evolution time as input to predict the system future state. As a representative example of a system exhibiting complex dynamics, we evaluate the performance of FI-Conv on the task of predicting turbulent plasma fields governed by the Hasegawa-Wakatani (HW) equations. The HW system models two-dimensional electrostatic drift-wave turbulence and exhibits strongly nonlinear behavior, making accurate approximation and long-term prediction particularly challenging. Using an autoregressive forecasting procedure, FI-Conv achieves accurate forward prediction of the plasma state evolution over short times (t ~ 3) and captures the statistic properties of derived physical quantities of interest over longer times (t ~ 100). Moreover, we develop a gradient-descent-based inverse estimation method that accurately infers PDE parameters from plasma state evolution data, without modifying the trained model weights. Collectively, our results demonstrate that FI-Conv can be an effective alternative to existing physics-informed machine learning methods for systems with complex spatio-temporal dynamics.